Implement ConcurrentLfu events by bitfaster · Pull Request #795 · bitfaster/BitFaster.Caching

bitfaster · 2026-04-29T20:02:34Z

This PR implements events for ConcurrentLfu as a switchable policy. When disabled all event code is fully elided at runtime by the JIT compiler.

Events are considered perf critical in ConcurrentLfu because the ItemRemoved logic is invoked as part of the maintenance cycle, this introduces overhead even when there are no events registered. The Maintenance method latency determines cache throughput at the limit, so any overhead here is not desired. Later, these events could be captured in a list and processed asynchronously via the scheduler.

In this implementation, calling TryRemove defers event processing to the maintenance cycle (thus TryRemove and policy based eviction behave the same), whereas TryUpdate executes the event handler immediately when TryUpdate is called.

Based on this earlier PR: #727

coveralls · 2026-04-29T20:51:45Z

coverage: 99.192% (+0.01%) from 99.181% — users/alexpeck/lfuevents10 into main

bitfaster · 2026-04-29T22:36:28Z

Analysis performed by Claude, data shows code size in bytes:

Update path (`TryUpdate` → `EventInliner.OnUpdatedEvent`)

Method	EventPolicy	NoEventPolicy	Δ
`AddOrUpdate`	897	802	−95

Maintenance path (`DoMaintenance` → `Evict` → `EventInliner.OnRemovedEvent`)

Method	EventPolicy	NoEventPolicy	Δ
`AddOrUpdate` (drives writes)	977	889	−88
`OnWrite` (drains buffered removes)	1,503	1,391	−112
`EvictFromMain` (eviction loop)	1,653	1,545	−108
`Evict` (single-victim teardown)	358	260	−98
`EventPolicy.OnItemRemoved` (standalone)	112	not emitted	−112
`Maintenance`	2,467	2,495	+28 (layout)
`DoMaintenance`	1,285	1,302	+17 (layout)
`AfterWrite`, `EvictFromWindow`, `OptimizePartitioning`, `TryScheduleDrain`, `ScheduleAfterWrite`, `AdmitCandidate`	identical	identical	0

Assembly evidence

In EventPolicy Evict (and OnWrite):

mov  rcx, offset MT_BitFaster.Caching.ItemRemovedEventArgs<Int32, Int32>
call CORINFO_HELP_NEWSFAST                ; allocate args
...
call qword ptr [7FFC...]                  ; EventPolicy.OnItemRemoved(Int32, Int32, ItemRemovedReason)
call qword ptr [r14+18]                   ; invoke delegate

In NoEventPolicy Evict (and OnWrite): zero matches for ItemRemovedEventArgs, OnItemRemoved, or eventPolicy field loads. The epilogue goes straight from evictedCount++ to ret.

Verdict

EventInliner.IsEnabled = typeof(E) == typeof(EventPolicy<K,V>) is folded as a JIT-time constant per generic instantiation, eliminating the event branch and everything inside it: oldValue capture, delegate field loads, null check, args allocation, invocation.

What causes the layout difference to increase code size?

The JIT compiles ConcurrentLfuCore<...EventPolicy> and ConcurrentLfuCore<...NoEventPolicy> as two separate methods
(struct generics → distinct codegen, no canonicalization). Even when the source is identical and no instructions are
added/removed, the emitted byte count can drift by tens of bytes from second-order effects:

Branch encoding. x86 has jmp short (2 bytes, ±127B reach) vs jmp near (5 bytes). When the surrounding code shrinks
because events were elided, some forward jumps that were near can switch to short — or vice versa. A single such flip
is ±3 bytes.
Register allocation differences. Registers r8–r15 require a 1-byte REX prefix; rax–rdi don't. Different live-range
pressure between the two instantiations can shift one variable from rdi to r12, which silently grows every instruction
touching it.
Basic-block reordering. The JIT orders blocks by edge weight / heuristics. Different inlining decisions in callees
can change perceived hotness and reorder blocks, which changes which edges are fall-through vs. branch.
Alignment padding. The JIT inserts NOPs ahead of loop heads (often 16-byte alignment). When earlier code shrinks,
the loop head's natural address shifts, so the padding changes.
Profile counter placement. The tier-1 JIT inserts CORINFO_HELP_COUNTPROFILE32 calls; placement isn't bitwise
identical across instantiations.

bitfaster · 2026-04-29T23:13:11Z

End-to-end latency: main vs this branch (NoEventPolicy)

ConcurrentLfu.GetOrAdd hot path (key already present), capacity 9, 1 stripe. main's pre-events ConcurrentLfu was built as BitFaster.Caching.MainBaseline.dll and referenced via extern alias so both implementations coexist in the same
benchmark process.

Scheduler	main `ConcurrentLfu`	this branch `NoEventPolicy`	Δ
`BackgroundThread`	27.59 ns	25.22 ns	−9% (within noise)
`Foreground`	45.90 ns	46.94 ns	+2% (within noise)
`ThreadPool`	26.30 ns	28.11 ns	+7% (within noise)
`Null`	12.56 ns	12.37 ns	−1.5% (parity)

The Foreground\NullScheduler rows are the cleanest signal — it skips all scheduler-side interference. The BackgroundThread and ThreadPool rows carry larger variance from scheduler timing and inter-thread coordination, not from code-path differences inside the cache.

bitfaster · 2026-04-30T00:56:12Z

Before (main)

Method	Mean	Error	StdDev	Ratio	Allocated
ConcurrentDictionary	3.988 ns	0.0489 ns	0.0457 ns	1.00	-
ConcurrentLfuBackground	27.655 ns	0.3675 ns	0.3438 ns	6.94	-
ConcurrentLfuForeround	44.977 ns	0.7981 ns	0.7466 ns	11.28	-
ConcurrentLfuThreadPool	30.827 ns	0.4778 ns	0.4469 ns	7.73	-
ConcurrentLfuNull	12.427 ns	0.1547 ns	0.1447 ns	3.12	-

After (8d4ee12)

Method	Mean	Error	StdDev	Ratio	Allocated
ConcurrentDictionary	3.522 ns	0.0252 ns	0.0223 ns	1.00	-
ConcurrentLfuBackground	25.430 ns	0.0973 ns	0.0812 ns	7.22	-
ConcurrentLfuForeround	46.249 ns	0.3032 ns	0.2836 ns	13.13	-
ConcurrentLfuThreadPool	15.723 ns	0.3457 ns	0.4116 ns	4.46	-
ConcurrentLfuNull	12.849 ns	0.0553 ns	0.0490 ns	3.65	-

Not clear why threadpool became so fast as a one off, does not repro.

bitfaster · 2026-04-30T03:05:59Z

Multiple runs (events and no events are the same code):

Alex Peck and others added 19 commits January 9, 2026 09:45

part 1

661d895

handle evicted vs removed

84dee9a

fix test build

a8b3804

fix tests

582d27c

format

e3ce574

tests + fix props

dccab41

fix trim and clear

f74df27

lfucore

b79f649

Merge branch 'main' into users/alexpeck/lfu4

bf3307e

test event policies

1e8dc28

make new types internal

06e7a02

Merge branch 'main' into users/alexpeck/lfu4

7f86b54

proxy comment

21586f4

TLfu tests

c6f778d

Merge branch 'main' into users/alexpeck/lfu4

afc5f03

merge

ea70493

cleanup

656ebfa

dotnet format

10d836b

Merge branch 'main' into users/alexpeck/lfuevents10

961a26e

bitfaster marked this pull request as ready for review April 29, 2026 23:15

Alex Peck added 5 commits April 29, 2026 16:22

add missing tests, inline events

845c5f8

re-add logging

444a35b

simplify event inline

d3fac93

fix reason

908f938

refine test assert

8d4ee12

bitfaster commented Apr 30, 2026

View reviewed changes

Comment thread BitFaster.Caching/Lfu/ConcurrentLfu.cs Outdated

bitfaster commented Apr 30, 2026

View reviewed changes

Comment thread BitFaster.Caching/Lfu/ConcurrentLfu.cs Outdated

bitfaster commented Apr 30, 2026

View reviewed changes

Comment thread BitFaster.Caching/Lfu/ConcurrentLfu.cs

bitfaster commented Apr 30, 2026

View reviewed changes

Comment thread BitFaster.Caching/Lfu/ConcurrentLfuCore.cs Outdated

Alex Peck added 2 commits April 30, 2026 10:49

cleanup

fdf7ec9

bench10

9945c8d

bitfaster merged commit 0686f4e into main May 1, 2026
26 of 27 checks passed

bitfaster deleted the users/alexpeck/lfuevents10 branch May 1, 2026 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ConcurrentLfu events#795

Implement ConcurrentLfu events#795
bitfaster merged 26 commits intomainfrom
users/alexpeck/lfuevents10

bitfaster commented Apr 29, 2026 •

edited

Loading

Uh oh!

coveralls commented Apr 29, 2026 •

edited

Loading

Uh oh!

bitfaster commented Apr 29, 2026 •

edited

Loading

Uh oh!

bitfaster commented Apr 29, 2026

Uh oh!

bitfaster commented Apr 30, 2026

Uh oh!

bitfaster commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bitfaster commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bitfaster commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update path (TryUpdate → EventInliner.OnUpdatedEvent)

Maintenance path (DoMaintenance → Evict → EventInliner.OnRemovedEvent)

Assembly evidence

Verdict

What causes the layout difference to increase code size?

Uh oh!

bitfaster commented Apr 29, 2026

End-to-end latency: main vs this branch (NoEventPolicy)

Uh oh!

bitfaster commented Apr 30, 2026

Uh oh!

bitfaster commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bitfaster commented Apr 29, 2026 •

edited

Loading

coveralls commented Apr 29, 2026 •

edited

Loading

bitfaster commented Apr 29, 2026 •

edited

Loading

Update path (`TryUpdate` → `EventInliner.OnUpdatedEvent`)

Maintenance path (`DoMaintenance` → `Evict` → `EventInliner.OnRemovedEvent`)

bitfaster commented Apr 30, 2026 •

edited

Loading